Support prometheus metrics #73

7ing · 2024-11-22T22:32:55Z

certmanager_csi_certificate_request_expiration_timestamp_seconds certmanager_csi_certificate_request_ready_status
certmanager_csi_certificate_request_renewal_timestamp_seconds certmanager_csi_driver_issue_call_count_total
certmanager_csi_driver_issue_error_count_total
certmanager_csi_managed_certificate_count_total
certmanager_csi_managed_volume_count_total

fixes: #60

cert-manager-prow · 2024-11-22T22:33:05Z

Hi @7ing. Thanks for your PR.

I'm waiting for a cert-manager member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

erikgb · 2024-11-22T22:40:08Z

/ok-to-test

munnerz

This looks great, thanks Jing 🙌 the integration and unit tests here make it far easier to review confidently!

My main questions/concerns are around the construction logic in the metrics subpackage, which I think we need to decouple from net.Listener (and allow more flexibility for projects that already have their own prometheus.Registry they'd like to re-use).

munnerz · 2025-01-08T14:20:55Z

metrics/metrics.go

+}
+
+// NewServer registers Prometheus metrics and returns a new Prometheus metrics HTTP server.
+func (m *Metrics) NewServer(ln net.Listener) *http.Server {


This isn't called outside of test cases, and I guess that is by design as it is expected that the corresponding implementation should call NewServer on metrics.Metrics to register their Listener.

Could you possibly update the example/ implementation in the root of this repository to demonstrate how to actually add the /metrics endpoint? I also wonder if the Managed should be extended to be able to auto-serve this endpoint in cases where a user doesn't need to provide their own listener (but does want metrics to be served).

What if we renamed this to Register(*prometheus.Registry) rather than tying the net.Listener logic into the registration?

We can always have/find some kind of csihelpers.Handle(*http.Server, *prometheus.Registry) function elsewhere then, which needn't be opinionated about csi-lib.

Could you possibly update the example/ implementation in the root of this repository to demonstrate how to actually add the /metrics endpoint?

Yes, it is there. And modified accordingly based on recent changes.
https://github.com/7ing/csi-lib/blob/abf15631238fa809d10d9c206178c414d9495b4f/test/integration/metrics_test.go#L83-L95

Thanks for the suggestions. I have changed to use a DefaultHandler instead, which could be served as a reference implementation for http handler.

munnerz · 2025-01-08T14:21:54Z

metrics/metrics.go

+	)
+
+	// Create Registry and register the recommended collectors
+	registry := prometheus.NewRegistry()


Could there be cases where a user wants to provide their own Registry and have these metrics registered into it, so they can be served alongside driver-specific metrics? This would make the library more composable with existing drivers that may serve their own metrics already.

Good idea, code changed per your suggestion.

munnerz · 2025-01-08T14:23:59Z

metrics/metrics.go

+	registry := prometheus.NewRegistry()
+	registry.MustRegister(
+		collectors.NewProcessCollector(collectors.ProcessCollectorOpts{}),
+		collectors.NewGoCollector(),


I feel like these should be auto-registered, but I think the Metrics struct itself should be able to be setup so that it doesn't embed these Go metrics into them (i.e. so it only adds in the csi-lib specific metrics to the registry).

Perhaps we need some 'basic' constructor function for the csi-lib metrics, with some form of NewRecommendedMetrics function that calls the basic one, but also adds in standard Go-esque metrics? Not sure :)

Agree, I will remove those default registry and let the caller to pass their Registry instead.

7ing · 2025-02-05T00:30:18Z

@munnerz Thank you for your valuable inputs. Sorry took so long to make the change. But I guess this version addressed most of your concerns.

cert-manager-prow · 2025-05-30T18:39:43Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign joshvanl for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

inteon · 2025-06-09T10:05:22Z

test/integration/metrics_test.go

+
+	// Should expose that CertificateRequest as ready with expiry and renewal time
+	// node="f56fd9f8b" is the hash value of "test-node" defined in driver_testing.go
+	expectedOutputTemplate := `# HELP certmanager_csi_certificate_request_expiration_timestamp_seconds The date after which the certificate request expires. Expressed as a Unix Epoch Time.


Will certmanager_csi_... be the prefix for the metrics when exposed in https://github.com/cert-manager/csi-driver?
Can we make sure they are all using similar names? Does this happen automatically?
Could you provide an example of before and after for https://github.com/cert-manager/csi-driver with these changes applied?

Thank you @inteon for your review.

Yes, certmanager_csi_... will be the prefix for the metrics when in the csi-driver. It is part of the definition from:
https://github.com/7ing/csi-lib/blob/b9186bad5b6f9af9bc93dbd28571d2d9219700e6/metrics/metrics.go#L27-L31

We choose this name based on cert-manager.io definition: https://github.com/cert-manager/cert-manager/blob/5e09ef6c0552df0bde64746c735cb1ff324b6261/pkg/metrics/metrics.go#L44
All cert-manager controllers have certmanager_.. prefix.

Our https://github.com/cert-manager/csi-driver does not have any certmanager related metrics. Currently it only serves k8s components metrics, like cpu / mem etc. That's why this PR exist. This test files show the expected output regarding certmanager metrics (besides the k8s metrics upon driver configuration).

erikgb · 2025-08-29T17:36:10Z

@7ing We have made some major dependency upgrades in this module. Are you able to rebase your PR, preparing for another round of review? Sorry for the inconvenience and for the delays in review. 😒

Following metrics added: certmanager_csi_certificate_request_expiration_timestamp_seconds certmanager_csi_certificate_request_ready_status certmanager_csi_certificate_request_renewal_timestamp_seconds certmanager_csi_driver_issue_call_count certmanager_csi_driver_issue_error_count certmanager_csi_managed_certificate_count certmanager_csi_managed_volume_count fixes: cert-manager#60 Signed-off-by: Jing Liu <[email protected]>

fixes: cert-manager#60 Signed-off-by: Jing Liu <[email protected]>

Signed-off-by: Jing Liu <[email protected]>

7ing · 2025-08-29T18:36:31Z

/retest

7ing · 2025-08-29T18:38:24Z

@erikgb done with rebase

erikgb

This is great stuff! Thanks @7ing! I did my first pass on this PR now, and I think I would prefer using a Prometheus collector to avoid the add/remove metrics for metrics based on API resources. Please take a look and let me know what you think! It's not a blocker, but a pattern we are adopting in cert-manager projects nowadays.

erikgb · 2025-08-30T08:43:40Z

manager/manager.go

+	if opts.Metrics == nil {
+		opts.Metrics = metrics.New(opts.Log, prometheus.NewRegistry())
+	}


I think we should avoid creating a new Prometheus registry if metrics are not enabled. If I am not mistaken, this will only waste CPU and memory. I would suggest removing the DefaultHandler function, making the user of csi-lib responsible for managing the handler on top of the registry supplied to csi-lib. This will probably require some changes to the code producing metrics. Perhaps an interface (with a no-op implementation) could be used to avoid cluttering the functional code?

erikgb · 2025-08-30T08:53:05Z

manager/manager.go

 	}

 	actualDuration := crt.NotAfter.Sub(crt.NotBefore)

 	renewBeforeNotAfter := actualDuration / 3

-	return crt.NotAfter.Add(-renewBeforeNotAfter), nil
+	return crt.NotAfter, crt.NotAfter.Add(-renewBeforeNotAfter), nil


Suggested change

return crt.NotAfter, crt.NotAfter.Add(-renewBeforeNotAfter), nil

return crt.NotAfter, crt.NotAfter.Sub(renewBeforeNotAfter), nil

Nit: I find this a bit easier to read.

erikgb · 2025-08-30T08:56:58Z

manager/manager.go

+// getExpiryAndDefaultNextIssuanceTime will return the certificate expiry time, together with
+// default time at which the certificate should be renewed by the driver- 2/3rds through its
+// lifetime (NotAfter - NotBefore).
+func getExpiryAndDefaultNextIssuanceTime(chain []byte) (time.Time, time.Time, error) {


I am not a huge fan of function names containing the And term. Isn't the expiry timestamp the most important return param now? Could we just name it something like calculateExpiryTime? Open for better suggestions, but preferably without the "and". 😆

erikgb · 2025-08-30T09:01:21Z

metrics/certificaterequest.go

+	m.certificateRequestExpiryTimeSeconds.DeletePartialMatch(prometheus.Labels{"name": name, "namespace": namespace})
+	m.certificateRequestRenewalTimeSeconds.DeletePartialMatch(prometheus.Labels{"name": name, "namespace": namespace})
+	m.certificateRequestReadyStatus.DeletePartialMatch(prometheus.Labels{"name": name, "namespace": namespace})
+}


I think we should use a Prometheus Collector for metrics derived from API resources. This will prevent bugs like the one fixed by cert-manager/cert-manager#7856. API resources should be cached in all controllers, so just querying the API, which should hit cache, is not a problem.

cert-manager-prow bot added the dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. label Nov 22, 2024

cert-manager-prow bot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 22, 2024

cert-manager-prow bot added ok-to-test and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 22, 2024

7ing force-pushed the metrics branch from c381e55 to 5f85df6 Compare December 3, 2024 00:28

munnerz reviewed Jan 8, 2025

View reviewed changes

7ing requested a review from munnerz February 13, 2025 21:14

7ing force-pushed the metrics branch from abf1563 to 104e7b0 Compare May 30, 2025 18:39

cert-manager-prow bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 30, 2025

wallrj self-requested a review June 6, 2025 20:26

inteon reviewed Jun 9, 2025

View reviewed changes

wallrj removed their request for review June 20, 2025 09:58

7ing added 3 commits August 29, 2025 11:05

Resolve comments for metrics package

fc8bcc0

fixes: cert-manager#60 Signed-off-by: Jing Liu <[email protected]>

fix promlinter and boilerplate issues

2b533b0

Signed-off-by: Jing Liu <[email protected]>

7ing force-pushed the metrics branch from b9186ba to 2b533b0 Compare August 29, 2025 18:13

fix minor golangci-lint lissue

99834d6

Signed-off-by: Jing Liu <[email protected]>

erikgb reviewed Aug 30, 2025

View reviewed changes

	return crt.NotAfter, crt.NotAfter.Add(-renewBeforeNotAfter), nil
	return crt.NotAfter, crt.NotAfter.Sub(renewBeforeNotAfter), nil

Support prometheus metrics #73

Are you sure you want to change the base?

Support prometheus metrics #73

Uh oh!

Conversation

7ing commented Nov 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cert-manager-prow bot commented Nov 22, 2024

Uh oh!

erikgb commented Nov 22, 2024

Uh oh!

munnerz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

7ing commented Feb 5, 2025

Uh oh!

cert-manager-prow bot commented May 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

7ing Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

erikgb commented Aug 29, 2025

Uh oh!

7ing commented Aug 29, 2025

Uh oh!

7ing commented Aug 29, 2025

Uh oh!

erikgb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

7ing commented Nov 22, 2024 •

edited

Loading

7ing Jun 9, 2025 •

edited

Loading